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ABSTRACT 

RNA-dependent RNA polymerase (RdRp) is essen- 
tial to viral replication and is therefore one of the 
primary targets of countermeasures against these 
dangerous infectious agents. Development of 
broad-spectrum therapeutics targeting polymer- 
ases has been hampered by the extreme sequence 
variability of these sequences. RdRps range in 
length from 400-800 residues, yet contain only ~20 
residues that are conserved in most species. In this 
study, we made structure-based comparisons 
that are independent of sequence composition 
using a recently developed algorithm. We identified 
residue-to-residue correspondences of multiple 
protein structures and created (two-dimensional) 
structure-based alignment maps of 37 polymerase 
structures that provide both sequence and structure 
details. Using these maps, we determined that 
~75% of each polymerase species consists of 
seven protein segments, each of which has high 
structural similarity to segments in other species, 
though they are widely divergent in sequence com- 
position and order. We define each of these 
segments as a 'homomorph', and each includes 
(though most are much larger than) the well-known 
conserved polymerase motifs. All homomorphs 
contact the template tunnel or nucleoside 
triphosphate (NTP) entry tunnel and the exterior of 
the protein, suggesting they constitute a structural 
and functional skeleton common among the 
polymerases. 

INTRODUCTION 

The polymerase protein family has been studied exten- 
sively for >40 years. This interest has been motivated by 



their unique function — to replicate all forms of life, and 
confounded by their sequence diversity. As more tertiary 
structures of polymerase were solved, it became apparent 
that widely diverse sequences form highly similar struc- 
tures. There has not, until recently, been a time-effective 
computational method to make detailed comparisons of 
these observations. The objective of this study was to 
clarify the relationship between structure and sequence 
in a group of RNA-dependent RNA polymerases 
(RdRps) that replicate many of the viruses that represent 
significant threats to life throughout the world. We 
selected well-studied species in order to maximize the 
amount of experimental data that could be used to 
evaluate the association of functional residues and struc- 
ture (Table 1). We used the StralSV algorithm (1) to 
perform structure comparisons between all of the 
selected species. We created maps of residue-to-residue 
(R2R) correspondence from which we determined the 
boundaries of structurally similar segments — which we 
named 'homomorphs'. In contrast to the relatively short 
lengths of previously described motifs, we found that most 
homomorphs are long, and each provides a structural con- 
nection between the template tunnel or NTP entry tunnel 
and the exterior of the protein. 

The tertiary structure of the replicative unit of most 
RdRps is highly conserved (2). It resembles that of a 
right-handed palm, with finger-like folds curved inward 
to form a tunnel that encircles the template that is being 
processed (3). Most single-unit polymerases are 400-800 
residues in length. Early polymerase studies found that 
~22 residues are highly conserved in all polymerases, 
and in most species they are in the same sequential 
order (4). Some are clustered, with two to four highly 
conserved residues within a segment ~10 residues in 
length. The sequence segment that includes each of the 
highly conserved residues or clusters has been described 
as a motif. The motifs are arranged in most species in 
the order: G-F1-F2-F3-A-B-C-D-E. The birnaviruses 
(IBDV and IPNV) differ from this scheme due to a 
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Table 1. Viral species and PDB structures used as queries in this study 
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Queries of both open and closed structures were used to assess the impact of this feature on R2R correspondence. 
Enterobacteria bacteriophage T7 [NCBI:NC_001604], which has a genome of 39 937 bp and 60 proteins and infects 
Escherichia coli, contains both RNA polymerase (T7 RNAP) and DNA polymerase (T7 DNAP). The T7 RNAP, 
typified by PDB:1S77 replicates multiple sequences of RNA initiated from transcription sites throughout the genome. 
The T7 DNAP, typified by PDB:1T7P, '...fills DNA gaps that arise during DNA repair, recombination and replica- 
tion. . .'—a Family A polymerase (NCBI:NC_001604). 



transversion involving the C Motif (C-A-B) (5). The 
references for each of the motifs and species that have 
been studied are listed in Table 2; these references were 
selected because they included either alignments of one 
or more motifs with several species, or alignments for a 
particular motif not found elsewhere. Apart from these 
conserved motifs, the RdRp sequences are highly 
variable. An extensive study of picornaviruses by 
Koonin et al.(6) illustrates this variability. The basis 
of Koonin et a/.'s study was an alignment of 64 
species using the algorithm multiple sequence 



comparison by log expectation (MUSCLE) (7) and a 
manual adjustment. The alignment included four species 
that are also in our sample group, and these had a total 
of 32 conserved residues within sequences ~550 residues 
in length. 

The analysis that we present in the following pages dem- 
onstrates that highly similar structures can be formed by 
very different sequences. Structure comparison, rather 
than sequence comparison, enabled us to readily recognize 
functionally significant segments of similarity and differ- 
ence between sequences. 
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MATERIALS AND METHODS 

Sample selection and processing 

In total, 18 well-studied viral species with solved RNA 
polymerase structures and four viral species with solved 
DNA polymerase structures were selected for analysis. 
We used the StralSV algorithm (http://proteinmodel.org/), 
described in detail previously (1), to perform the analyses. 
StralSV compares the R2R (structural) correspondence of 
each sequence in a set of reference sequences to a specified 
query sequence, beginning at the start of the sequence and 
continuing to the end, by evaluating successive overlapping 
segments of a user-selected length. Each of the selected 
structures was used as a query to all structures available 
in the Protein Data Bank (PDB release 2011_01_25; 
number of chains 176 365) (8). The results were filtered 
for structural segments of at least 55% LGA_S structure 
similarity (1) to at least one query segment of 90 amino 
acids in length (size cutoff for the structural context) 
from which R2R correspondences were extracted from 
local tightly superimposed spans: continuous segments of 
the minimum length of five amino acids. These parameters 
together contributed to the identification of common 
regions of structure similarity, which were used to distin- 
guish regions of conservation (structure matches) from 
regions in which structure deviates (non-matches). The 
StralSV comparison of each initial query to all structures 
in PDB resulted in the identification of a final set of 37 PDB 
structures that were used as a reference polymerase struc- 
ture set in this study. One representative of each species 
within the set of 37 PDB structures was used to create an 
all-against-all structure comparison using StralSV. The 
species and PDB identities of this reference set are 
summarized in Table 1 . 



Creation of structure maps 

The output from the all-against-all structure comparisons 
was parsed to extract R2R correspondences for each 
query/template pair. In each comparison the full query 
sequence was represented. At some positions in the 
template sequences, gaps occurred either due to 
structure deviation exceeding the alignment cutoff (5 A) 
or because the template contained additional residues 
(e.g. a loop) without correspondence in the query 
structure. 

A structure map was created for each set of R2R cor- 
respondences derived from each query/template-set 
StralSV comparison by combining the data for each 
binary alignment in an Excel spreadsheet. In this article, 
we report the structure map using poliovirus RdRp as the 
primary query (Figures 2-8), although in most cases we 
include structure maps with other species as query 
(Figures 2 and 4-8). Alternative query species were used 
when structure similarity of some of the templates was not 
identified with poliovirus as query (e.g. Motif G in WNV 
and DENV were identified based on DENV as the query). 
The query that contributes to non-poliovirus matches is 
indicated on each alignment ('q' following the species 
abbreviation). 
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The structure maps were used as the basis for the struc- 
ture alignments described in this article. On all structure 
maps, we identified the Motifs A-F as described by Gong 
and Peersen (9) by coloring the background of the 
columns matching the residues of the motif orange and 
the background of the columns matching the highly 
conserved residues yellow. A similar coloring scheme 
was used for Motif G, except that it depicts the residues 
identified by Pan et al. (5) because Motif G was not 
specified in the Gong and Peersen study (9). On all struc- 
ture maps, we colored the residues of the picornaviruses 
blue; the caliciviruses green; HCV and BVDV 
(flaviviruses) black; WN and DENV (flaviviruses) red; 
PHI6 black; REOV brown; ROTAV, IBDV, IPNV 
black; HIV purple; TERT, T7RNAP, N4 black; TAQ tur- 
quoise and T7DNAP black. 

The segment of conserved structure adjacent to each 
motif was determined from the StralSV maps. For each 
of the queries and each of the motifs, the location at which 
structure conservation of most species became discontinu- 
ous was noted. We defined the boundaries of a 
homomorph as the position at which the structural 
segment shared by all representatives in a set became dis- 
continuous in more than two species. In all structure 
maps, the conserved segments that we identified based 
on StralSV R2R correspondences are colored light blue. 
We defined the homomorph of each motif as the segment 
consisting of the conserved motif plus the adjacent struc- 
turally conserved segments. The length of each 
homomorph varied somewhat depending on the query. 
For each query species, the start and end of the homo- 
morphic segment of each motif were recorded, and a 
20 x 20 matrix for each motif was generated to compile 
the data from each query (data not shown). This matrix 
was used to identify the minimum start and maximum end 
of each homomorph, and these values are summarized in 
Supplementary Table SI. These values are plotted in 
Figure 1, which illustrates the maximal expanse of the 
homomorphic segments that include each of the polymer- 
ase motifs. All of the tertiary structures were illustrated 
using the Cn3D program (10). 

RESULTS 

Overview 

Structural examination of the sequence motif regions 
yielded extended regions of structural conservation. We 
named each of these regions a 'homomorph', defined as 
a sequence segment that shares a highly similar tertiary 
structure with other species, independent of the sequence 
composition. We found that most of the homomorphs 
were at least twice as long as the corresponding 
sequence motif. The extent of this expansion is illustrated 
in Figure 1. The length of each homomorph was 
determined separately for each species, using a single 
structure for each species as the query in a StralSV 
analysis. The identity of the start and end of each 
homomorph depends on the structural similarity to a 
given query. Therefore, there is some minor query-specific 
variability of the location of the ends of homomorphs that 



can be observed in Figure 1. Within the homomorphs, 
non-matching residues can be used to identify minor dif- 
ferences between species. In most single-stranded RdRp 
(ss-RdRp) species (PV, COXS, HRV, FMDV, NV, 
RHDV, SAPV, HCV, BVDV, WN and DENV), the 
homomorphs of Motif G are the largest (median of 53 
residues), followed by A (49), B (46), E (37), F3 (28), D 
(23), C (17), Fl (10) and F2 (8). In double-stranded 
RdRps (ds-RdRps) (PHI6, REOV, ROTAV, IBDV and 
IPNV), the homomorphs of Motif G are relatively short 
(median of 12 residues). Several other homomorphs are 
also shorter in ds-RdRps: B (48), A (41), E (31), F3 
(28), D (16), C (12), Fl (11) and F2 (2). The lengths and 
occurrences of homomorphs of polymerases that are 
associated with DNA (HIV, TERT, TAQ, T7 DNAP, 
T7 RNAP and N4) are variable and will be discussed in 
the sections describing each motif. 

The homomorphs of all species are similarly distributed 
over the length of the polymerase (Figure 1). The 
ss-RdRps (PV, COXS, HRV, FMDV, NV, RHDV, 
SAPV, HCV, BVDV, KUNJ and DENV) are most 
similar to each other. The spacing between homomorphs 
is more variable in the ds-RdRps (PHI6, REOV, ROTAV, 
IBDV and IPNV), and in general larger than in the 
ss-RdRps. The homomorph of Motif C (hmC) is identified 
in the birnaviruses despite a sequence inversion that places 
it before Motif A (5). Relatively large segments between 
homomorphs occur in PHI6 between Fl and F3, and in 
KUNJ, DENV and PHI6 between B and C. The spacing 
between motifs is notably reduced in HIV and TERT 
(RdDps). In birnaviruses (IBDV, IPNV), the homo- 
morphs of C and A are only three residues apart, and 
the distance between the homomorphs of Motif F3 and 
C is greater than the typical F3-B distance. Most 
homomorphs are separated from each other by a 
segment that contains a turn (secondary structure), or 
there is a turn at the beginning or end of the homomorph. 

Within all RdRps, all motifs occur within a length of 
375 residues. In T7-DdRp (T7 RNAP) and N4, the motifs 
are spread out over approximately 600 residues. The 
amount of R2R correspondence for most of the RdRps, 
determined from the minimum and maximum values of all 
homomorphs, is ~75% over the span from Motif G 
through Motif E (Supplementary Table SI). 

Homomorph of Motif G results 

The structurally aligned sequences that comprise hmG are 
summarized in Figure 2A. 

The R2R correspondences of WN and DENV could not 
be evaluated for the Motif G region (approximately PV 
101-121), as the structural configuration of the segments 
of these viruses that would be expected to match the 
homomorph of Motif G (hmG) segment had not been 
determined. Within the homomorph, most of the 
ss-RdRps were highly similar (Figure 2A, top). BVDV is 
similar to the other ss-RdRps in the N-terminal segment, 
but no longer matches them at the C-terminal segment. 
Only Motif G, and not a homomorph, was identified in 
PHI6, IBDV and IPNV. In the region of Motif G, StralSV 
did not identify R2R correspondences between any of the 
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Figure 1. The distributions and lengths of the homomorphs and conserved motifs of viral polymerases in this study. The homomorphs of each 
species are illustrated from the start of Motif G, sequence position = 0, through the end of Motif E. The homomorphs and motifs are colored as 
follows: G (green), F1-F2-F3 (maroon-gray-black), A (blue), B (gold), C (red), D (purple), E (aqua). The darker bars show the sequence positions 
of the homomorphs, and the lighter bars show the sequence positions of the motifs as described by Gong and Peersen (9) or Pan el al. (5). The 
number of residues from the start of the polymerase structure to the start of the first homomorph is identified for each species at the left of the chart. 
The PDB structures, and consequently, sequence position numbers for KUNJ, DENV and TAQ, which are used throughout this article, do not begin 
at the polymerase; therefore, for this figure, the distance from the start of the polymerase is shown after the slash. For species lacking Motif G, the 
first identified homomorph is indicated at the left of the start position. The length of the polymerase of each species is listed at the right of the chart. 



RNA polymerases and REOV, ROTAV, HIV, TERT or 
DNA-dependent polymerases (TAQ, T7 DNAP, T7 
RNAP and N4). 

There were structural discontinuities within the 
homomorph (noted by x in Figure 2A) and similar 
discontinuities within the motif. These minor 
discontinuities identify species-specific differences within 
a segment that is otherwise highly continuous in several 
species. For example, FMDV has 3 AA between S87 and 
T90 (PV numbering) and therefore does not match the 
structure of PV 88-LD-89, which has only 2 AA. 
In contrast, WN and DENV have the same number of 
residues for the gaps from A388-R392 and 
G385-R389, respectively, but these segments were not 
structurally aligned by StralSV within the parameters 
used in this study. The segment PV-Y102 to A109 is a 
(3-hairpin unique to picornavirus RdRps(ll). The 
numbering on NV and HCV clarifies that these 
regions are continuous in these species (and the 
other caliciviruses, RHDV and SAPV, though not 
numbered). 

Figure 2B and C illustrates the tertiary structure of the 
homomorph using a poliovirus structure (PDB:1RA6). 
Most of the N-terminal segment is a single helix that 
extends over nearly half of the surface of the protein. 



Both ends of the homomorph terminate at the exterior 
surface of the protein. 

The distance between the homomorphs of Motifs G and 
Fl was 20-37 residues in the ss-RdRps (in all species 
where both were present) and longer in the ds-RdRps 
(median 47 residues) (Figure 1). 



Homomorph of Motif F results 

Three components of Motif F have been recognized: Fl, 
F2 and F3 (2,5). In some species there are sequence 
segments between these motifs. In all the species in our 
sample set except PHI6, in those species that have R2R 
correspondence within Motif F, the three F motifs are 
continuous; therefore, we have combined them, and the 
adjacent structurally aligned segments, into a single 
homomorph. The structurally aligned sequences that 
comprised homomorph of Motif F (hmF) for RdRps 
and HIV are summarized in Figure 3A. HmF extended 
five residues upstream from the N-terminal edge of 
Motif Fl [as defined by Gong and Peersen (9)] and ~20 
residues downstream from Motif F3 [as defined by Gong 
and Peersen (9)]. HmFl was found in all RdRp species 
except WN and DENV; it was not possible to evaluate 
R2R correspondence for this segment of WN and 
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A. Structure map of the homomorph of Motif G (hmG) 
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B. HmG in poliovirus [PDB:1RA6] C. Surface exposure of hmG 

in poliovirus 
v y^, hmG, N-term 




Figure 2. (A) The residues comprising Motif G as described by Pan et al. (5) and Gorbalenya et al. (35) are indicated by a gold background, and 
those that are highly conserved by a yellow background. The homomorph relative to PV is 60 residues in length and includes the motif and segments 
on both sides of the motif (light blue background). Conserved residues outside of Motif G have a pink background. A small 'x' indicates no R2R 
structural correspondence at that position. The upper part of the alignment is based on a PV query. The middle section of the alignment, which 
includes WN and DENV is based on a DENV query. Motif G as described by previous researchers is shown at the lower section of the figure. Both 
Pan et al. (5) and Gorbalenya et al. (35) identify Motif G in IBDV and IPNV, although Pan et al.'s description includes more residues. In PHI6, 
IBDV and IPDV, only a short segment of Motif G was found to have R2R correspondence with the RdRps, and only using a NV query; these 
segments are similar to those describe by Pan et at. (5), but slightly shorter. (B) All the structural diagrams in this article are illustrated using 
poliovirus polymerase (PDB:1RA6). The N-terminal segment of hmG is shown in blue, Motif G in gold and the C-terminal segment of hmG in 
brown. (C) This figure, which is rotated ~90° from Figure 2B, shows the surface exposure of hmG and the residues within the motif that line the wall 
of the tunnel. 



DENV as the structure of this segment has not been 
resolved. Motifs Fl and F2 are always continuous if F2 
is present, and Motif F2 is present in most species. 
Motif F2 is represented by a single residue in PHI6, 
REOV and ROTAV (dsRNA), two residues in HIV 
(RdDp), 15 residues in BVDV and 10 residues in 
HCV (two of which, in HCV, are structurally aligned 
to the other RdRps). Motif F2 varied in length from 
6 to 15 residues. In PHI6, there was a 61 -residue 
segment between Fl and F3. HmF3 was present in all 
RdRp species. 

Figure 3B and C illustrates the tertiary position of hmF. 
Most of the structure is hairpin-like, with some residues of 
Motif F2 at the apex, which is located at the exterior 
surface of the protein. HmFl and hmF3 are approxi- 
mately parallel for several residues. HmF3 then independ- 
ently extends to the surface of the protein approximately 
opposite the Motif F2 site. Figure 3D shows the N- and 
C-terminal residues and some residues of the C-segment of 



hmF3 at the surface of the protein. Figure 3E shows the 
position of Motif F2 relative to the template tunnel. 

The segments between hmF and hmA are 8-17 residues 
in ss-RdRps and PHI6, 28-30 residues in ds-RdRps 
(REOV, ROTAV, IBDV and IPNV), 30-40 residues in 
RdDps (HIV and TERT) and DdRps (T7 RNAP and 
N4) and 102-131 in DdDps (TAQ and T7 DNAP) 
(Figure 1). 

Homomorph of Motif A results 

The structurally aligned sequences that comprise 
homomorph of Motif A (hmA) are summarized in 
Figure 4A. The homomorph (relative to PV) extends 15 
amino acids from each flank of Motif A, plus the length of 
a species-specific loop at the N-terminal segment of the 
motif. All ss-RdRps (PV, COXS, HRV, FMDV, NV, 
RHDV, SAPV, HCV, BVDV, WN and DENV) are well 
aligned in the N-terminal segment. PHI6, ROTAV, IBDV 
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A Structure map of the homomorph of Motif F (hmF) 
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B. HmF in poliovirus [PDB:1RA6] C. HmF transects the protein 




D. Terminal residues of hmF at E. Motif F2 is at the apex of 
the protein surface the hmF fold 

hmF3, 

N-term F 2 




Figure 3. (A) The residues of Motifs Fl and F2, according to Gong and Peersen (9), are indicated by an orange background and the highly 
conserved residues by a yellow background. HmF includes the residues with light blue background, and Motifs Fl, F2 and F3. Lower case residues 
indicate that they are present but do not match the structure. (B) HmFl (light blue), illustrated using the poliovirus structure 1RA6, begins about 
five amino acids upstream from Motif Fl (dark blue) and ends at hmF3 (dark brown), about 20 amino acids downstream from Motif F3 (orange). 
Highly conserved residues from Motifs Fl, F2 and F3 line the template tunnel (empty space within the crystal near the intersection of F1-F2-F3). 
Motif F2, which varies in length and composition between species, is at the exterior of the protein. (C) Most of hmF is folded back on itself, and 
Motif F2 (gray) is at the apex of this fold at the exterior surface of the protein. (D) The terminal residues of hmF (blue) are located at the surface of 
the protein. In addition to the N-terminal (blue) and the C-terminal (brown) of hmF, several residues of the C-terminal segment (brown) 
186-AMRMA-190 are also at the surface of the protein. (E) The length of Motif F2 (black) is highly variable in different species. Motif F3 
(gold) is the start of a highly conserved segment of the homomorph that transects the protein and terminates at a nearly opposite exterior surface. 



and IPNV (ds-RdRps) have fewer aligned residues. REOV 
(ds-RdRp) and HIV and TERT (RdDps) do not have 
R2R correspondence with the ss-RdRps. The DdRps 
(T7 RNAP and N4) and DdDps (TAQ and T7DNAP) 
share a homomorphic structure within the N-terminal 
segment, but it is substantially different from the RdRp 
structure and therefore is not included in the homomorph 
or Figure 4A. Within the motif, HIV corresponds only to 
NV and SAPV (only found using an HIV query), 



indicating a significant structural difference from other 
species; HIV also lacks R2R correspondence beyond the 
motif and therefore is not included in the homomorph. At 
the C-terminal segment of the homomorph, most species 
in the sample set, except HIV, have a homologous struc- 
ture. At some sequence positions within hmA, a particular 
residue composition is conserved throughout a viral 
family (e.g. picornavirus), and a different residue compos- 
ition is conserved in another viral family at the same 
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A. Structure map of the homorph of Motif A (hmA) 
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B. Structure of hmA 



C. Terminal residues 
of hmA 




hmA, 
C-term 



D. Residues related to 
life cycle in hmA 
species hmA, 
specific C-term 
loop a^J, 




N-term ]( 

mutation 



Figure 4. (A) HmA (light blue background) extends about 15 residues from each side of Motif A as defined by Gong and Peersen (9) (orange 
background), and includes highly conserved residues (yellow background). Non-aligning residues (compared with the query) are indicated by an 'x\ 
In cells with a light blue background filled with a number, the number is the sequence position of the adjacent matches for each species; numbers in 
the white column between them summarize the length of sequence that the non-matched sequence represents in each species. In this segment there are 
more residues in each species than between the corresponding residues in PV, indicating that this region is a loop that is absent in PV, and the loop 
length varies by species. At the left of the alignment (209-214, uncolored), there is a structure common to several species, but too few to qualify the 
region as part of the homomorph. (B) In this figure of poliovirus (PDB:1RA6), the N-terminal segment of the homomorph is blue and the C-terminal 
segment is brown. The terminal residues of HmA are at the exterior surface of the protein (PDB:1RA6). Motif A is centered within the homomorph 
at the wall of the template tunnel. (C) The terminal residues of the homomorph and the helix adjacent to each are constituents of the protein surface. 
(D) In PV, an insertion (red) at the C-terminal edge of the motif is lethal: L241-i-S242 (42). A species-specific loop (green) affects the catalytic rate (in 
PV) (37). 



position. This within-family sequence conservation 
(>75%) occurs at the following sequence positions 
(shown in Figure 4A, PV numbering): 214, 234, 237- 
240, 245 and 249. 

Within the N-terminal side of the homomorph, at the 
edge of the motif (PV 226-227), there is a minor discon- 
tinuity in structure homology (Figure 4A). The distance 
between the discontinuities in each species is provided in a 
column within the figure (white) that indicates the 
entire span over which discontinuity exists for each 
species. However, the loop represented by this discontinu- 
ity varies in length by only one to four amino acids. 

Figure 4B illustrates the tertiary structure of the 
hmA. Each end of the homomorph is at the exterior 
surface of the protein (Figure 4C), and its center — the 
conserved Motif A — is at the surface of the template tunnel. 
The overall configuration of the homomorph is spring-like 
(Figure 4D). The species-specific loop within the 
homomorph is located at the exterior of the protein. 



The sequence segment between the homomorphs of 
Motif A and Motif B (hmB) is ~4-20 residues in the 
RdRps, and mostly greater than 20 residues in the 
DNA-dependent polymerases. It is relatively long in 
REOV (41), T7 RNAP (81) and N4 (98). In the 
birnaviruses (IBDV, IPNV), Motif C precedes Motif 
A in sequence; this sequence inversion is described in a 
later section of this article, which describes Motif C. 

HmB results 

Motif B is a component of the largest homomorph 
identified in the RdRps. The homomorph begins 21 
residues upstream from Motif B and extends 10 residues 
downstream. The motif is 15 residues long. The size of the 
homomorph is consistent in most species. The structurally 
aligned sequences that comprise the homomorph are 
shown in the top section of Figure 5A. They include all 
the RdRps in the sample set plus TERT (RdDp). Each of 
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A. Structure map of the homomorph of Motif B (hmB) 
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B. HmB in poliovirus C. HmB on the protein D. Terminal residues 
[PDB:1RA6] surface of HmB 

hmB, N-term 




Figure 5. (A) A StralSV alignment based on the PV query is located at the top section of this figure. The middle section compares some of the R2R 
matches using other queries, to those found by poliovirus. Residues common to both alignments indicate extremely close matches, and those 
differently represented are closer to the respective query. The species TAQ-N4-T7 DNAP-T7 RNAP align with each other over about 20 
residues, but do not align structurally with the other species. (B) HmB [illustrated using poliovirus (PDB:1RA6)] is the largest homomorph in 
the RdRps. The N-terminal segment of HmB (blue) begins at the exterior surface of the protein, folds back on itself to form a classical [5-hairpin 
(at the apex, PV 275-YKN-278), and then continues to Motif B (gold), which is located at the surface of the template tunnel. The C-terminal segment 
of the homomorph (brown) continues as a single chain to the exterior surface of the protein. (C) The N-terminal segment of hmB is at the surface of 
the protein. (D) Each of the terminal residues of hmB, and the apex of the N-terminal loop are on the exterior surface of the protein. 



these species matched a poliovirus query, indicating there 
is greater structural similarity than in other homomorphs 
and motifs. The N-terminal segment of the homomorph 
contains some discontinuities that are resolved by using 
R2R matches for alternative queries (Figure 5A, lower 
section). The C-terminal segment of the homomorph is 
well represented in all RNA polymerases and TERT. No 
R2R correspondence was found between the residues 
comprising hmB in the RNA polymerases and residues 
in the DNA-dependent polymerases (T7 RNAP, N4, 
TAQ and T7 DNAP). 

The lower section of Figure 5A illustrates the 
dependence of the R2R correspondence on the query 
sequence. These differences make it possible to 
identify fine details between structures. Our definition of 
each of the homomorphs, however, is based on the inclu- 
sion of all R2R alignments using all queries in the 
sample set. 

The position of the hmB within the tertiary structure of 
PV is illustrated in Figure 5B. The N-terminal residue is at 



the exterior surface of the protein. The N-terminal 
segment is a classical p-hairpin protein structure that is 
folded back on itself and is almost entirely exposed on a 
surface nearly perpendicular to the face of the protein that 
contains the N-terminal residue (Figure 5C). The base of 
the loop transitions to Motif B at the template tunnel. The 
C-terminal side of the homomorph extends from the 
tunnel to the exterior surface of the protein (Figure 5D). 

The distance between the homomorphs of Motifs B and 
C (hmC) (Figure 1) is <6 17 in all RdRps except KUNJ 
and DENV, which are 36 and 35 residues, respectively. In 
the DNA-dependent polymerases, this distance is between 
98 (TAQ) and 258 (N4) residues. In IBDV and IPNV, the 
segments between the homomorphs of Motifs B and D are 
16 and 11 residues, respectively. 

HmC results 

The structurally aligned sequences that comprise hmC are 
shown in Figure 6A. Motif C is the only RdRp motif that 
is not a component of a larger homomorphic structure. 
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A. Structure map of the 

homomorph of Motif C (hmC) 
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B. HmC in poliovirus [PDB:1RA6] C. Terminal residues of hmC 




Figure 6. (A) The high number of species that align to PV indicates that the structure of Motif C is highly conserved. Although A T7 DNAP query 
was required to identify the matches for the N4-TAQ-T7 RNAP species, it was achievable. HmC is the only homomorph for which there is R2R 
correspondence in all species of the study group. (B) Motif C (gold) is the only motif in the RdRps that is not a component of a larger structure. 
Motif C [illustrated using poliovirus (PDB:1RA6)] is tightly folded upon itself in a manner that places the highly conserved residues (yellow) at the 
tunnel wall, whereas the N-terminal segment of the motif (blue) and C-terminal segment of the motif (brown) are parallel to each other and penetrate 
the protein. (C) The terminal residues of both the N- and C-terminal segments are at the surface of the protein. 



The segments immediately adjacent to both flanks of 
Motif C do not even cluster into subgroups. Motif C is 
short — 12 residues in most RdRps and folds sharply back 
on itself (Figure 6B). The highly conserved residues 
(labeled Motif C) are at the surface of the template 
tunnel and both the N-terminal and C-terminal residues 
are at the exterior surface of the protein (Figure 6C). 

In the birnaviruses IBDV and IPNV, there is a sequence 
inversion that results in the relocation of Motif C to a 
position immediately preceding Motif A. Figure 7A 
shows an alignment that documents this inversion. The 
top and bottom segments of Figure 7A illustrate that all 
species are well aligned upstream of Motif C (IPNV pos- 
itions 365-372) and within Motif A (IPNV positions 



399^09). RHDV, SAPV and BVDV are not well 
aligned within Motif C using the IPNV query, and there- 
fore are missing from the middle section of Figure 7A 
(IPNV positions 382-393). The numbering of IPNV and 
IBDV is sequential, indicating that Motif C precedes 
Motif A in these species. The numberings of NV and 
HCV indicate there are R2R matches with IPNV at 
Motif C, but that over this segment the match is not in 
sequential order. Using a PV query, however, all of these 
species have R2R matches over this segment (shown in 
Figure 6A). The IPNV query indicates that the structure 
of Motif C of the birnaviruses more closely matches NV 
and HCV than the others in the sample set. The difference 
in linear order that results from the sequence inversion is 
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A. Structure map of Motif C 
using a birnavirus query 



B. Homomorphs A-B-C in 
poliovirus [PDB:1RA6] 
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C. Homomorphs C-A-B in IBDV 
[PDB:2PGG] 




Figure 7. (A) The sequence of the IPNV query is listed vertically in this table, with the sequence position number of each IPNV residue at the left in 
the first column. The R2R corresponding residues for each species are listed in the columns to the right. In birnaviruses (IBDV and IPNV), unlike 
other RdRps, Motif C precedes Motif A. The top segment of this figure (IPNV positions 365-372) shows that all species (that align to IPNV) are 
well aligned prior to Motif C. IPNV, NV, HCV and IBDV are aligned in Motif C (IPNV positions 382-393), but there are no R2R matches for 
RHDV, SAPV and BVDV. In Motif A, at the bottom segment of the figure (IPNV segments 399-409) all species are again, well aligned (though 
fewer residues are aligned in RHDV). (B) In poliovirus (illustrated here with PDB:1RA6), the sequence order of homomorphs is: A (blue), B (gold) 
and C (red). (C) In IBDV, the difference in sequence order is compensated by a modified structure that maintains the motifs within a tertiary 
position that is similar to all other RdRps. The sequence order of the IBDV homomorphs is: C (red), A (blue) and B (gold). 



compensated by a modified structure that maintains the 
motifs within a tertiary position that is similar to all other 
RdRps (Figure 7B and C). 

The distance between the hmC and hmD (homomorph 
of Motif D) is <10 residues in the RdRps. It is relatively 
large in PHI6 (25 residues) and is indeterminate in the 
DdDps, as neither Motif D nor its homomorph is within 
the PDB structures included in this study. 

HmD results 

The structurally aligned sequences that comprise the hmD 
are shown in Figure 8 A. The homomorph is 21 residues 
long and consists of a 10-residue extension from the 
N-terminal edge of the motif plus the motif itself. The 
structure of the N-terminal segment is more highly 
conserved (i.e. has more R2R matches) than the motif. 
Various query sequences were tested with the expectation 
that they would capture additional alignments. The 



middle section of Figure 8A illustrates that this 
produced some improvement. For example, using an 
HCV query, there are R2R matches to TERT, TAQ and 
T7 DNAP. The C-terminal edge of the motif has some 
R2R correspondence, suggesting that the structure of the 
motif is moderately conserved. Using T7 DNAP as a 
query (lowest segment of the figure), only a small 
portion of the C-terminal edge of Motif D and a few 
species have similar structures. There is no alignment of 
PHI6 within the N-terminal segment of the homomorph, 
because in this region PHI6 consists of a 24-residue loop 
between the end of Motif C and the start of Motif D. 

The tertiary structure of the hmD is illustrated in 
Figure 8B and C. This homomorph lies mostly at the 
exterior surface of the protein. The motif lines the wall 
of the polymerase tunnel. 

The segment between the hmD and homomorph of 
Motif E (hmE) is <15 residues in all structures in the 
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A. Structure map of the homomorph of Motif D (hmD) 
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B. HmD in poliovirus C. Surface exposure D. Terminal residues 
[PDB:1RA6] of hmD . n of hmD 




Figure 8. (A) The structure of the N-terminal segment of hmD (blue background) is more conserved than that of Motif D (orange background). 
There is a high amount of conservation of some residues (pink background) within the motif. Residues consistent with the motif described by Gong 
and Peersen (9) are bold. In the middle section of the figure, different queries were used to identify more of the R2R correspondences within the 
motif. In the lowest section of the figure, the alignment is based on a T7 DNAP query. There is an off-by-one alignment of ROTAV (compare lines 
labeled 'ROTAV, HCV q' and 'ROTA, T7 DNAP q', both italicized), suggesting an unusual structural conformation. (B) The homomorph of Motif 
D [illustrated using poliovirus (PDB:1RA6)] includes the motif itself (gold) and an adjacent upstream segment (blue). The motif lines the wall of the 
template tunnel. (C) Most HmD residues are located on the surface of the protein. (D) The terminal residues of hmD are located at the exterior 
surface of the protein, on a different face than the main section of the homomorph. 



sample set, except in IBDV and IPNV in which it is 28 and 
40 residues, respectively. 

HmE results 

The structurally aligned sequences that comprise hmE are 
summarized in Figure 9A. HmE is large and in most of the 
ss-RdRps (PV, COXS, HRV, FMDV, NV, RHDV, 



SAPV, HCV, BVDV, WN and DENV) it is highly 
conserved. The motif is near the N-terminal edge and a 
loop region is located near the middle of the homomorph. 
The sequences vary in length due to the loop region. The 
length of hmE in the caliciviruses (30-34 residues) is 
shorter than those in the picornaviruses (36-37 residues); 
HCV and BVDV loops are 37 and 35 residues, respect- 
ively, and the loops of WN and DENV are the longest at 
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A. Structure map of the homomorph of Motif E (hmE) 
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B. HmE in poliovirus [PDB:1RA6] C. Terminal residues of hmE 




Figure 9. (A) Motif E is a component of a large, well-defined homomorph. There is considerable sequence similarity between PV and DENV in this 
homomorph, illustrated at the bottom of the figure. TERT and HIV have R2R correspondence over the motif segment and three amino acids on 
each side of it. HIV does not have any R2R correspondence with any species in the sample set in this region. (B) Motif E (gold), [illustrated using 
poliovirus (PDB:1RA6)] forms part of the NTP entry tunnel. Both the N-terminal segment of the homomorph (blue) and the C-terminal segment 
(brown) extend to the surface of the protein. (C) From the C-terminal edge of the motif, the homomorph extends to the surface of the protein to 
expose a species specific loop (of variable length) (green) at the surface, then turns, then transects the protein and extends to an opposite surface. 



38 and 39 residues, respectively. There is strain-specific 
amino acid variability in this segment of HRV. HmE is 
well represented by all RdRps. No R2R correspondence 
was found with HIV or TERT. These species, however, 
are structurally matched to each other (Figure 9A, middle 
section). There is considerable sequence similarity between 
PV and DENV within this homomorph; this is illustrated 
in the bottom section of Figure 9A by the shaded 
conserved residues. DdRps and DdRps are not included 
in the analysis of this region because the region is missing 
from the structures in our sample group. 

The tertiary structure of the hmE is illustrated in 
Figure 9B and C. Most of the homomorph is at the 
exterior of the protein near the NTP entry tunnel. 
Although it has extensive surface exposure, each 
terminus of the homomorph appears to be anchored by 
residues that are not part of the homomorph; as a result, 
the terminal residue at each end of the homomorph is 



exposed as a single residue at the exterior surface of the 
protein. Motif E is located near the N-terminal edge of the 
homomorph and contacts the surface of the NTP entry 
tunnel (2). The C-terminal segment of the homomorph is 
folded back on itself in a manner that places the species- 
specific loop at the surface of the protein (Figure 8C). The 
homomorph forms a double strand through PV_M392, at 
which point the remainder of the homomorph is a sin- 
gle-stranded helix that emerges at the exterior surface of 
the protein. In PV, the C-terminal of hmE (R402) is 
exposed at the surface the protein and surrounded by 
the segment 28-SAFHYVFEG-36. 

DISCUSSION 

Structure-based sequence alignment using the StralSV 
algorithm (1) enabled us to identify seven distinct hom- 
ologous structures in most of the polymerases in our 
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collection of 22 species. In the RdRps, the combined 
regions of structural homology represent ~75% of the 
sequence from the start of homomorph of Motif G 
(hmG) through the end of hmE in each species (~375 
residues). There is <10% conservation of sequence com- 
position among these species. 

Each of the homomorphs includes a sequence motif 
consisting of characteristic highly conserved functional 
residues that are essential to replication. The tertiary 
position of each of the homomorphs includes at least 
one residue (and sometimes more) in contact with the 
exterior surface of the protein and one or more highly 
conserved functional residues located within or at the 
wall of the template tunnel. 

We defined the boundaries of a homomorph as the 
position where the structural segment shared by all repre- 
sentatives in a set became discontinuous in more than two 
species. For many queries, this position could be confi- 
dently identified. However, these positions sometimes 
varied by one or two residues, depending on the query 
sequence. Query-dependent differences in R2R matches 
were also observed within the motifs themselves, where 
minor differences in structure resulted in a lack of R2R 
matches for short segments of some queries. Our approach 
was to set the boundary at the position where most queries 
were in agreement, but to keep in mind that these edges 
might vary by one or two residues. 

Poliovirus had R2R correspondence with other species 
in the sample set more often than did any other structure. 
In almost all instances, we were able to map functional 
features of other proteins to a structurally similar segment 
of poliovirus. This property of centrality makes it a useful 
template for polymerase structure properties. 

HmG discussion 

HmG is shared by picornaviruses, caliciviruses and 
flaviviruses, although the structures of each of these 
groups begin to diverge within the C-terminal segment 
of Motif G. Motif G is characterized by the conserved 
motif [T/Sxi_ 2 G], which is located near the outer edge of 
the template tunnel. The motif may enforce the correct 
orientation of essential residues and a primer (35). Each 
flank of the homomorph contains amino acid residues that 
significantly affect the life cycle of the species. In PV, mu- 
tations at the N-terminal residue of the homomorph 
(D71A/E72A) are lethal (37). Mutations located outside 
the N-terminal edge of the motif (PV D105A/E108A) 
result in small plaques (37). Downstream from the 
C-terminal edge of the motif, there is a nuclear localiza- 
tion signal (NLS) in the picornaviruses and caliciviruses. 
The NLS is located two residues from the C-terminus of 
the homomorph and mutations in the NLS (K125A/ 
K126A/K127A and K127A/R128A/D129A) are lethal to 
PV (37). 

HmF discussion 

Previous research found that Motif F occurs in all RdRps 
(38), that it recognizes the incoming NTP (39), serves as 
the primary fidelity checkpoint for RdRp and reorients the 
proper triphosphate into a position for efficient catalysis 



(40). HmF is an extensive structure with surface exposure 
at both ends and near its mid-section at Motif F2 
(Figure 3A-D). Motif F2 (Figure 3E) is analogous to 
the loop in hmG that varies in composition and length; 
it is upstream of a highly conserved motif and is species- 
specific. The large size of this homomorph and its pos- 
itioning that transects the protein while maintaining 
contact with the template tunnel is consistent with its es- 
tablished role in transcription, which requires both 
fine-scale stability and large-scale mobility. Motif F3 
consists of mostly basic amino acid residues and forms 
the roof of the NTP entry tunnel (41); the characteristic 
conserved arg residue is essential to nucleotide binding 
(38). The required orientation of the F motifs would be 
stabilized by the loop formed by hmF and the double- 
stranded segment formed by the extension of the 
homomorph beyond the motifs. Both the N-terminal 
and C-terminal residues of the homomorph are exposed 
at the exterior surface of the protein. In PV, mutations of 
residues adjacent to the N-terminal are lethal: G 1494-1150 
(42) and H149A/K150A (37). 

HmA discussion 

The conserved residues of Motif A (in PV, D233 and 
D238) control the function of the metal ions at the 
active site (41,43), which perform the phosphotransfer 
essential to polymerase activity (41). D233 is ligand to 
the metal (44). D238 is essential to NTP binding (3). 
Similar functions for the residues of Motif A have been 
identified for HCV (45), HRV (45) and FMDV (2). Motif 
A is centered within a spring-like homomorph (Figure 4B- 
D). Each end terminates at the exterior surface of the 
protein, and the beginning and end of the homomorph 
terminate nearly opposite each other. Mutations in the 
N-terminal segment of the homomorph (in PV at 
E226A/E227A) result in small plaques (37), suggesting 
that these residues influence the rate of catalysis. This is 
the region where species-specific structures protrude from 
the homomorph (PV L224-L229). This position, relative 
to the conserved motif, is analogous to a similar structure 
in hmG and Motif F2. All these structures contain a 
segment that varies in length and composition by species 
and is located upstream from a highly conserved motif, 
essential to replication. 

An insertion at the C-terminal edge of Motif A 
(L241-i-S242) is lethal (Figure 4A) (42). The structure 
of the homomorph is highly conserved in this region, 
suggesting that the structural consequences of an insertion 
are not tolerated. The position of this lethal insertion is 
similar to the position of lethal mutations in Motif 
G, although the major effect in hmG may be the loss of 
the nuclear localization signal. Mutations near the 
C-terminal residue of the Motif A homomorph (PV 
G257) affect function: E254A/K255A is lethal (37), and 
the insertion I256-ile-G257 results in temperature sensitiv- 
ity (46). 

HmA provides a structural connection between the 
functional residues at the template tunnel and the 
exterior surface of the protein. Sequence residues immedi- 
ately adjacent to the motif have a high degree of 
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functionality. It is possible that the orientation of the 
conserved segments that comprise the homomorph 
would be affected by changes in the orientation of 
residues at the edges of the motif. N -terminal and 
C-terminal segments of the homomorph are helices, 
which are likely to be relatively rigid. 

HmB discussion 

Motif B is near the center of a very large homomorph that 
contacts the exterior surface at nearly opposite positions. 
As stated by Bruenn (38), Motif B forms the base of the 
template-entry channel and may function in guiding the 
template entry into the active site. Choi et al. (17) 
observed that the highly conserved asn (N414 in BVDV) 
is conserved in all picornavirus. Hansen et al. (47) found 
that in HRV, N297 is involved in positioning NTP for 
recognition. Ferrer-Orta et al. (2) determined that the 
equivalent FMDV-N307 and D245 (Motif A) together 
are involved in ribonucleoside triphosphate (rNTP) selec- 
tion. Tao et al. (21) and Butcher et al. (20) proposed that 
Motif B interacts with the 2'-OH group on the incoming 
nucleotide. Korneeva and Cameron (48) determined that 
FMDV-N307 interacts with the C-terminal-OH in the 
uridylylation complex, but with the 2'-OH in the elong- 
ation complex. The role of Motif B in the mechanisms of 
active site closure has recently been described in detail by 
Gong and Peersen (9). These experiments document the 
role of the highly conserved asn in the motif in multiple 
species and suggest that structural alignment may be 
useful for the identification of potential functionally 
equivalent residues in structures that have R2R 
correspondences. 

StralSV structure analysis indicates that the structure of 
Motif B is highly conserved in all RdRps, unlike some of 
the other motifs that have unmatched R2R correspond- 
ences. This highly conserved structure is consistent with its 
role in NTP recognition. An insertion in Motif B at PV 
C290-S-S291 is lethal (42). Within the N-terminal segment 
of the hmB, the mutation in PV-K276L results in small 
plaques (49). The structural position of this mutation 
(within the homomorph and upstream from the motif) is 
similar to that of rate-affecting mutations in the 
homomorphs of Motifs G and A. At the C-terminal end 
of the homomorph, in BVDV (BVDV-F426), mutation of 
residues C427, S428 and R447 to ala reduces 
primer-dependent RNA elongation and abolishes 
de novo synthesis (17). 

HmC discussion 

Motif C is not a component of a larger structurally 
conserved segment, but has the same key features of the 
other homomorphs. It is folded in a manner that places 
the apex of the fold at the wall of the template tunnel, and 
both the N-terminus and C-terminus at the exterior 
surface of the protein (Figure 6B and C). Therefore, 
Motif C as defined in the literature comprises the 
homomorph. The absence of R2R correspondence 
adjacent to hmC indicates that the structures of the 
adjacent sequence segments are highly specific to each 
species. HmC is highly conserved in the RdRps and 



highly similar to the DNA-dependent polymerases. 
Although there is a sequence inversion in the birnaviruses 
(Motif C precedes Motif A), Figure 7B and C illustrates 
that despite the difference in sequence order, the 
homomorphs occupy a similar tertiary position. StralSV 
analysis indicates that the structure of Motif C is highly 
conserved in the RNA-dependent polymerases, though 
slightly different in the DNA-dependent polymerases 
(Figure 6A). 

Motif C is part of the classic 'RRM-fold' that forms the 
core of the palm domain of all these polymerases (together 
with that part of Motif A that forms a (3-sheet with Motif 
C. Experimental studies have demonstrated that several 
residues within Motif C are sensitive to the position and 
composition of mutants. The highly conserved residues, 
GDD, occur near the center of the motif. The primary 
function of these residues is to coordinate the metal ions 
associated with the incoming rNTP(45,43). In PV, 
mutation of D to E in either or both positions (D328 or 
D329) is lethal (50). In HCV, mutation of G317A is also 
lethal (51). However, in birnaviruses the highly conserved 
residues are ADN, rather than GDD, and mutation to 
GDD increases RNA synthesis activity (5). Certain muta- 
tions immediately upstream from the GDD motif are 
lethal in PV: Y326[CHIMS] (50). This is similar to the 
effect of the L241-i-S242 at the downstream edge of 
Motif A in PV. Near the N-terminal end of Motif C in 
HCV (HCV T312), the mutation D311A characterizes 
chronic hepatitis (52). Mutation at the edge of a highly 
conserved structure seems to have a substantial effect on 
the viral life cycle. The R2R comparisons summarized in 
these structure maps identify the types of sequence vari- 
ability that can occur while maintaining the same spatial 
structure (Figure 6A) and demonstrates and identifies the 
variations in composition that can be tolerated even 
within a key functional motif. The R2R correspondence 
of other RdRps with birnaviruses in Motif C (Figure 7A), 
despite the sequence inversion in birnaviruses (CAB) 
supports the premise that conservation of structure is a 
significant, if not dominant, factor in evolution. 

HmD discussion 

HmD is different from the other homomorphs in that it 
lies mostly on the surface of the protein (Figure 8C). Like 
the others, however, its terminal residues are located at a 
distinctive surface (Figure 8D). In the case of hmD, they 
come from an opposite surface rather than the interior of 
the protein. The N-terminal segment of hmD is more 
conserved than the motif itself, which forms the 
C-terminal segment. The motion of Motif D in the 
active state has not been captured by the existing struc- 
tures in PDB (40). Therefore, the lack of R2R correspond- 
ence in the motif may be a reflection of the limitations of 
the available structures. 

Residues within hmD perform varied functions. In PV, 
e.g. polymerases form an extensive lattice system by poly- 
merase-polymerase interactions; L342 and D349, located 
within hmD, contribute to interface I of this lattice system 
(53). The most highly conserved residue within the 
homomorph is a gly (PV G351) at the N-terminal edge 
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of the motif and central to the homomorph; gly in this 
position would facilitate the folding of the homomorph, 
and is consistent with Cameron et a/.'s (40) hypothesis 
that Motif D may be the most dynamic structural 
element of RdRps and RTs. Another conserved residue 
is a lys near the C-terminal edge (PV-K359). Residues 
equivalent to PV-K359 supply a proton to the nucleotidyl 
transfer reaction that increases the rate constant for nu- 
cleotide addition by 50- to 1000-fold (40). In PV, within 
the motif, the insertion T353-t-M354 results in small 
plaques, likely due to delayed RNA synthesis (54). In 
other homomorphs, mutations that affect the rate of syn- 
thesis occur more commonly outside of the motifs. 
Immediately downstream from the homomorph, 
PV-T362I is an attenuating mutation for the Sabin 
vaccine (40). 

HmE discussion 

HmE is ~36 amino acids in length (Figure 9A), and well 
represented by all RN A-dependent members of the sample 
set, except that no correspondence was found with HIV or 
TERT. These two species, however, are structurally 
matched to each other. There is considerable sequence 
similarity between PV and DENV within this 
homomorph, shown in the bottom segment of 
Figure 9 A. Appleby et al. (55) determined that Motif E 
is unique to RNA polymerases. 

HmE forms part of the NTP entry tunnel and has a 
considerable amount of exposure on the surface of the 
protein. The species-specific loop within hmE is at the 
outermost edge of the protein, a feature found in other 
homomorphs (G, F2 and A; Figures 2A, 3A and 4A, re- 
spectively). Huang et al. (25) found that the Motif E loop 
region acts as a pivot point for thumb subdomain 
movement upon template-primer binding. Motif E may 
also function in the proper positioning of the thumb 
relative to the palm (5). The turn of the loop projects 
into the active site cavity where it has been implicated in 
helping to position the C-terminal end of the primer 
strand for attachment to the a-phosphate of the NTP 
during phosphoryl transfer (56). Motif E in HCV plays 
a role in binding the priming nucleotide (not the incoming 
nucleotides) (38); HCV has a longer loop (Figure 9A), 
possibly related to this function. In PV, the C-terminal 
of hmE (R402) emerges from the protein into the 
segment 28-SAFHYVFEG-36. This segment contains 
residues F30 and F34, which interact with W403 to 
maintain the polymerase structure (39). 

SUMMARY 

Comparisons of the tertiary structures of the RdRps of 18 
viral species indicated that most of the highly conserved 
residues essential to polymerase function are embedded in 
large sequence segments that are highly conserved struc- 
turally, yet disparate in composition. We have named 
these conserved segments 'homomorphs' and have 
identified the composition and length of each homomorph 
that includes previously recognized polymerase motifs 
(Table 2). We have demonstrated that the RNA 



polymerases have structural skeletons (frames) that are 
highly conserved, with flexible segments between them, 
and that extensive segments of structure similarity can 
be identified by the methods we have described. These 
methods are applicable to the studies of other groups of 
proteins, and we anticipate that by accessing structure 
similarity independent of sequence composition, skeletal 
frameworks will be found in other groups of proteins. 
Additionally, after structure similarity is identified, differ- 
ences between members of the group become readily 
apparent. 

All of the homomorphs included residues that connect 
the template tunnel or the NTP entry tunnel with the outer 
surface of the protein. Although some of the surface 
residues within these homomorphs have specific functional 
roles, as reported in the literature (see citations in previous 
paragraphs), we anticipate that they may all be important 
for polymerase function; the consistent occurrence of 
homomorphs embedding motifs — even when a defined 
sequence motif is small in size — suggests a structure- 
function relationship between the motif and its structur- 
ally conserved flanking regions. It would be interesting to 
explore the possibility that interactions at the surface of 
the protein (e.g. protein-protein contact at surface 
homomorph residues) may subtly affect function buried 
deep beneath, within the tunnel. Furthermore, each 
homomorph is either divided by or is separated from 
another homomorph by a flexible secondary structure. 
Identification of the span of each homomorph and the 
terminal residue enables us to identify specific residues 
on the surface that would not, in many cases, be otherwise 
noticed. By comparing experimental data with the surface 
location of the ends of the homomorphs, we have found 
that these are often the sites of key functional interactions 
of the protein. A paper describing these sites is in 
preparation. 

We have compared the effects of currently recognized 
mutations within the motifs and within the homomorphs. 
Most mutations within the motifs are function-specific, 
related to either a change in charge or size, and in most 
cases the mutations are lethal [mA (42), mB (42), mC 
(50), (51)]. Mutations outside of the motif (but within the 
homomorph) are more often rate related and located in a 
segment that bulges from the homomorph by an amount 
that varies by species (Figures 2 A, 3 A and 4A). These 
differences support the hypothesis that residues actively 
involved in template processing are essential to viability, 
and most of them are components of a consistent, stable 
structure that places and/or maintains them in their ap- 
propriate functional position. However, the practice of 
mutating residues to ala has resulted in a somewhat 'all 
or nothing' perspective of mutations. StralSV analysis can 
facilitate informed selection of alternative residues of 
various compositions, which could possibly affect replica- 
tion rates to different extents. Experiments involving this 
type of testing would enhance predictive models and may 
provide new insights for the design and development of 
medical countermeasures. 

The extension of all homomorphs from the template 
tunnel to the exterior of the protein was an unexpected 
finding. Its universality in the polymerase family suggests 
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a functional significance. Residues within the 
homomorphs that were localized to the surface often 
had species-specific loops. The most likely reason these 
features have not been identified previously is due to the 
limitations of existing sequence and structure comparison 
tools — in particular, the ability to perform multi-species 
comparisons of structures, using overlapping windows of 
a size determined by the user, and the ability to select the 
criteria for R2R matches. The homomorphs as defined in 
this work add structural clarity and context to sequence- 
based functional motifs previously observed by numerous 
authors performing comparative studies among polymer- 
ases. The structure maps created from the R2R corres- 
pondences identified by the StralSV algorithm provided 
a unique and informative perspective of structure and 
function in RdRps. They readily identify unique regions 
of each species and those shared by proteins within a 
family. These are features that would be useful for 
studies of any protein family. Based on the results of 
this study, it may be possible to define characteristic 
homomorphs for many other protein families, despite con- 
siderable sequence variation. It may be feasible to classify 
homomorphs in a manner analogous to the SCOP 
database, and in doing so provide new insight into 
protein evolution. 

The StralSV algorithm simultaneously, rapidly and 
quantitatively identifies the similarities and differences of 
the structural components of multiple species and 
provides an output that facilitates the comparison of 
three-dimensional structure information. StralSV enabled 
us to cluster protein segments that have the same tertiary 
structure, independent of sequence variability. In a sense, 
it is an analog of Blast, although based on structure rather 
than sequence. The precision of StralSV makes it easy to 
identify small differences between and within species. The 
ability to process multiple species at the same time can 
rapidly accelerate our understanding of differences 
between them. The identified structural associations may 
also facilitate the transfer of structure-related functional 
information among proteins. 

The traditional perspective of the relationship between 
the amino acid sequence of a protein and its tertiary struc- 
ture has been that sequence determines structure. Under 
this premise, sequence-based evolutionary studies and 
phylogeny would inherently incorporate structure. In 
this study of RdRps, we demonstrated that structure 
accommodates substantial sequence variability, and that 
highly diverse sequences can generate highly similar 
tertiary structures. Structure-based phylogeny may 
provide new perspectives of protein evolution. 
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